216 research outputs found

    An Improved Metric Space for Pixel Signatures

    Full text link

    How good are your fits? Unbinned multivariate goodness-of-fit tests in high energy physics

    Full text link
    Multivariate analyses play an important role in high energy physics. Such analyses often involve performing an unbinned maximum likelihood fit of a probability density function (p.d.f.) to the data. This paper explores a variety of unbinned methods for determining the goodness of fit of the p.d.f. to the data. The application and performance of each method is discussed in the context of a real-life high energy physics analysis (a Dalitz-plot analysis). Several of the methods presented in this paper can also be used for the non-parametric determination of whether two samples originate from the same parent p.d.f. This can be used, e.g., to determine the quality of a detector Monte Carlo simulation without the need for a parametric expression of the efficiency.Comment: 32 pages, 12 figure

    Markov basis and Groebner basis of Segre-Veronese configuration for testing independence in group-wise selections

    Full text link
    We consider testing independence in group-wise selections with some restrictions on combinations of choices. We present models for frequency data of selections for which it is easy to perform conditional tests by Markov chain Monte Carlo (MCMC) methods. When the restrictions on the combinations can be described in terms of a Segre-Veronese configuration, an explicit form of a Gr\"obner basis consisting of moves of degree two is readily available for performing a Markov chain. We illustrate our setting with the National Center Test for university entrance examinations in Japan. We also apply our method to testing independence hypotheses involving genotypes at more than one locus or haplotypes of alleles on the same chromosome.Comment: 25 pages, 5 figure

    Non-linear regression models for Approximate Bayesian Computation

    Full text link
    Approximate Bayesian inference on the basis of summary statistics is well-suited to complex problems for which the likelihood is either mathematically or computationally intractable. However the methods that use rejection suffer from the curse of dimensionality when the number of summary statistics is increased. Here we propose a machine-learning approach to the estimation of the posterior density by introducing two innovations. The new method fits a nonlinear conditional heteroscedastic regression of the parameter on the summary statistics, and then adaptively improves estimation using importance sampling. The new algorithm is compared to the state-of-the-art approximate Bayesian methods, and achieves considerable reduction of the computational burden in two examples of inference in statistical genetics and in a queueing model.Comment: 4 figures; version 3 minor changes; to appear in Statistics and Computin

    Multi-objective optimisation for receiver operating characteristic analysis

    Get PDF
    Copyright © 2006 Springer-Verlag Berlin Heidelberg. The final publication is available at link.springer.comBook title: Multi-Objective Machine LearningSummary Receiver operating characteristic (ROC) analysis is now a standard tool for the comparison of binary classifiers and the selection operating parameters when the costs of misclassification are unknown. This chapter outlines the use of evolutionary multi-objective optimisation techniques for ROC analysis, in both its traditional binary classification setting, and in the novel multi-class ROC situation. Methods for comparing classifier performance in the multi-class case, based on an analogue of the Gini coefficient, are described, which leads to a natural method of selecting the classifier operating point. Illustrations are given concerning synthetic data and an application to Short Term Conflict Alert
    corecore